We provide the supplementary material for the paper titled "Efficient Sparse PCA via Block-Diagonalization" submitted to ICLR 2025.

The environment.yml file contains the dependencies that are required to run the codes in python. You can create a new enviroment using the following script:

conda env create -f environment.yml

Within the "code" folder, we provide our source code for Algorithm 5. Please note that binary_search_BB.py encodes the implementation of our framework when integrated with Branch-and-Bound algorithm, and binary_search_Chan.py encodes the implementation of our framework when integrated with Chan's algorithm. We implement our numerical tests for Model 1 in synthetic.py.

The Julia implementation of Branch and Bound method proposed by (Berk and Bertsimas, 2019) can be accessed at https://github.com/lauren897/Optimal-SPCA.

Regarding the datasets used in Section 5, we refer interested readers to the following sources:

CovColon, LymphomaCov1, Reddit1500, and Reddit2000: For additional datasets, we recommend visiting Santanu S. Dey's website (http://www2.isye.gatech.edu/~sdey30/txt_form_data.zip).

Leukemia, Lymphoma (with dimension 4026), Prostate: We recommend visiting https://stat.ethz.ch/Manuscripts/dettling/ 
Please note that these datasets need transformation first. What we did is that we dropped the response vector y, and then save the matrix into a txt file that has delimiter ',' between numbers.
Then, in our source code, it would read in the txt file, and automatically find out the covariance matrix via np.cov(A.T) in python.

Arcene and Dorothea: We recommend visiting https://archive.ics.uci.edu/dataset/167/arcene for downloading the data.
We did numerical tests on a file named arcene_test.data and dorothea_test.data therein.
However, this file also needs transformation into a txt file that has delimiter ',' between numbers.
Then, in our source code, it would read in the txt file, and automatically find out the covariance matrix via np.cov(A.T) in python.

GLI85 and GLABRA180: We recommend visiting https://jundongl.github.io/scikit-feature/OLD/datasets_old.html for downloading the data.
Please note that these datasets need transformation first. What we did is that we dropped the response vector y, and then save the matrix into a txt file that has delimiter ',' between numbers.
Then, in our source code, it would read in the txt file, and automatically find out the covariance matrix via np.cov(A.T) in python.


For the results, we refer readers to Section 5 in our paper.


